Data-driven spatiotemporal assessment of the event-size distribution of the Groningen extraction-induced seismicity catalogue

For induced seismicity, the non-stationary, heterogeneous character of subsurface stress perturbations can be a source of spatiotemporal variations in the scaling of event sizes; one of the critical parameters controlling seismic hazard and risk. We demonstrate and test a systematic, statistical, penalized-likelihood approach to analysing both spatial and temporal variations in event size distributions. The methodology used is transferable to the risk analysis of any subsurface operation, especially for small earthquake catalogues. We explore the whole solution space and circumvent conventional, arbitrary choices that require a priori knowledge of these variations. We assess the effect of possible bias in the derivation, e.g., due to tapering of the earthquake-size distribution, correlation between the b-value and the magnitude of completeness and correlation between the b-value and the largest magnitude observed. We analyse the spatiotemporal variations in the earthquake-size distribution of the Groningen induced seismicity catalogue (December 1991–November 16, 2021). We find statistically significant spatial variations without any compelling, statistical evidence of a temporal variation. Furthermore, we find that the largest magnitudes observed are inconsistent with the sampling statistics of an unconstrained earthquake-size distribution. Current risk assessment models likely overestimate the probability of larger magnitude events (M ≥ 3.0) and thus the risk posed.


Scientific Reports
| (2022) 12:10119 | https://doi.org/10.1038/s41598-022-14451-z www.nature.com/scientificreports/ In this paper we demonstrate and test a systematic approach to analysing both spatial and temporal variations in event size distributions. We assess whether spatiotemporal variations in the earthquake-size distribution exist in the Groningen induced seismicity catalogue. First, we analyse possible spatial variations in the b-value. To circumvent the arbitrary choices in external mapping parameters, we adopt the method introduced by Kamer and Hiemer 20 . We modify and expand the methodology to include the temporal dimension in the analysis. Throughout our analysis we will systematically explore the effect of possible sources of bias on the derived spatiotemporal variations. Our methodology is transferable and applicable to any induced-seismicity setting. It is particularly relevant for relatively small earthquake catalogues, and provides a rigorous and unbiased approach to untangling potential spatiotemporal variations and their physical meaning. At the same time, our findings could have consequences for future hazard and risk calculations for the Groningen gas field, and help inform important decision-making processes.

Data
Natural gas has been produced from the Groningen gas field since 1963. At present, about 70% of the estimated 2800 × 10 9 m 3 initial gas in place has been produced, dropping the initial mean pore pressure by about 25 MPa. The field is located in the sandstones of the Rotliegend formation, which is overlain by a thick layer of Zechstein halite and anhydrite salt deposits 25 . The reservoir is highly faulted with over 1100 mapped, steeply dipping normal (extensional) faults (Fig. 1). We refer the reader to 25 for more detailed information on the geology of the Groningen gas field.
The earthquakes in the Groningen gas field are induced by gas extraction at a depth of approximately 3 km 33 and have relatively small magnitudes ( M l ≤ 3.6 ) ( Fig. 1 22,33 ). A local geophone network with a detection threshold of local magnitude M l = 1.5 was installed in 1995 22 . In 2015, the geophone network was significantly extended to increase the detection of small magnitude earthquakes 22 . In total, 1396 events were detected between December 1991 and November 16, 2021, ranging from local magnitudes M l = − 0.5 to 3.6, with fourteen events M l ≥ 3.0 (e.g. 22 ). In this paper, we have used the catalogue reported by the KNMI (www. knmi. nl), from which we have selected all events within the outline of the Groningen gas field (Fig. 1). Note that in this catalogue a hypocentre depth of 3 km has been assumed a priori 22 .

Method
Earthquake-size distribution. The relation between the cumulative number of earthquakes (N) and magnitude (M) follows a power-law distribution expressed as log 10 34 , where M c is the magnitude of completeness and a and b (so-called b-value) are constants that describe the productivity and the relative size distribution, respectively.
In the analysis presented in this paper, the b-value was determined with the maximum likelihood method 35,36 , following Kagan 21 . In order to avoid bias in the b-value estimates 16 , we implemented a correction for magnitude binning 16 and small sample sizes 37 . The (regional) magnitude of completeness was calculated with the maximum curvature method (MCM 38 ). The advantage of the MCM is that results can be obtained fast and reliably, even for small sample sizes. On the other hand, the method tends to underestimate M c , especially for gradually-curved frequency-magnitude distributions. This disadvantage can be overcome by using a correction factor ( M c = M c (MCM) + �M c ) in combination with the bootstrap approach 38 . After careful assessment of the (regional) MCM results, while increasing the correction factor ( M c ), an initial correction factor of M c = 0.2 was adopted.
Penalized likelihood-based method. The classical spatial b-value mapping technique 17,18 depends heavily on external parameters for which the parameter-value choices require a priori knowledge of the spatial event-size distribution that one wishes to resolve in the first place 19 . The penalized likelihood-based method of Kamer and Hiemer 20 addresses these limitations. This parameter-free method is based on optimal partitioning using Voronoi tessellation, thereby exploring the whole solution space. The method uses a penalized-likelihood approach and the wisdom of the crowd philosophy 20 .
Voronoi tessellation partitions the space using a set of points (nodes) and assigns each node its nearest neighbourhood region. By random perturbation of the nodes, arbitrarily shaped and sized regions are obtained. The approach thus allows for a flexible, non-overlapping partitioning in space. The overall log-likelihood of each random tessellation can be computed by estimating the b-value in all Voronoi regions, computing the log-likelihood of each region, and subsequently summing the log likelihoods of the Voronoi regions 20 .
The overall log-likelihoods were subsequently penalized based on the number of free parameters by using the Bayesian Information Criterion (BIC) 39 , given by BIC = −log L + k 2 logN , where L is the overall likelihood, k represents the number of free parameters and N denotes the number of data points. Finally, all models were ranked by their BIC and the median BIC weighted ensemble model was calculated using the solutions outperforming a chosen null hypothesis, with a maximum of the 1000 best solutions. Thus, models of different complexity but with similar BIC had equal influence on the ensemble inference, which is a manifestation of the wisdom of the crowd philosophy.
We discretized the Voronoi selection space, in accordance with the event location uncertainties, in 2.5 × 2.5 km cells (Fig. 2a). Only cells with at least two events M l ≥ M c were considered a potential Voronoi node location (red dots in Fig. 2a). Each set of nodes partitioned the assessment space in anisotropic regions of various shapes and sizes. Increasing the total number of nodes allowed for the exploration of smaller scale variations.
The assessment of a single Voronoi cell, i.e., having no spatial variation, was regarded as the null hypothesis. Subsequently, the number of nodes considered in an analysis was increased from 2 to 50, performing 2000 random tessellations at each step. For each Voronoi cell in each tessellation, both the M c and the b-value were If no M c could be derived due to too small a sample size the M c of the null-hypothesis was adopted (Table 1). This occurred in only 2% of the estimations and as such did not have a substantial impact on the final results.
To derive temporal b-value variations devoid of choices that require a priori knowledge, we have adapted the spatial approach; a non-straightforward advance. We discretized the temporal selection space in 5-year cells, offset with respect to the analysis times by 0.33 years. This offset ensured that cell partitions did not align with the temporal analysis locations. Only cells or combinations of neighbouring cells with at least 20 events M l ≥ M c were considered a potential node location. We increased the number of minimum events relative to the spatial assessment, because the discretization was only one-dimensional (in time). This one-dimensionality of our temporal problem increased the probability that neighbouring cells were selected, and the minimum number of events would form the basis of the assessment of the cell. A very low number of minimum events would then introduce an extreme bias due to very small sample sizes.
Similarly to the spatial analysis, we again regarded the assessment of a single cell, i.e. having no temporal variation, as the null hypothesis. The number of nodes (n n ) considered was increased from 2 to (n t n − 1) , where n t n was the total number of potential node locations. By limiting the minimum number of events to 20, the temporal tessellation resulted in a limited number of potential node locations. Hence, it was important to avoid www.nature.com/scientificreports/ superfluous repetition of particular tessellations, as these would bias our results. The maximum number of unique random tessellations is limited by both the number of nodes considered (n n ) and the total number of potential node locations (n t n ) . The number of unique random tessellations of n t n nodes when considering n n nodes ( n t n T n n ) can be computed by n t n T n n = n t n !/n n ! n t n − n n ! . In our analysis, we wanted to ensure all unique tessellations were considered, while avoiding superfluous repetitions; therefore, we restricted the total number of random tessellations at each step to 2 n t n T n n with a maximum of 2000.

Results
Spatial variations in earthquake-size distribution. We first investigated possible spatial variations in the b-values in the Groningen gas field. The median b-values derived with the penalized likelihood-based method are shown in Fig. 2b and range from 0.77 to 1.52. We observe a very systematic division between low b-values in the north-northwest of the field (i.e., a relative abundance of larger earthquakes) and higher b-values in the west and east (i.e., a relative abundance of smaller earthquakes; Fig. 2d). The high b-value region in the east is less well defined and associated with a large interquartile range (Fig. 2c).
To complement the assessment of the statistical significance of this spatial pattern, we derived regions of comparable b-values from our spatial solution. The nodes assigned to each region are indicated in Fig. 2a by the different node colours. For each region, we computed the average regional b-value and M c based on all enclosed events (Table 1; Fig. 2e). The b-values obtained for the southwest (SW) and central-eastern (CE) regions are slightly larger than determined in the Voronoi analysis. This is consistent with observations of Kamer and Hiemer 20 , that b-values in a high b-value area may be underestimated by the Voronoi approach. We use the two sample, left-tailed t-test or Welch's test 40 to assess whether the derived regional b-value distributions are in fact samples of the same, larger distribution (e.g., the single distribution for the full Groningen catalogue): where b 1 and b 2 are the derived b-value estimates, s 1 and s 2 are the standard deviations of the two estimates, and n 1 and n 2 are the sample sizes. We find that the probability that the regional b-values of the northwest (NW), SW and CE regions are samples of a single b-value distribution, is less than 5%. This further confirms that the obtained spatial distribution of the b-value is statistically significant at the 95% confidence level.
Bias introduced by tapering of the earthquake-size distribution. The limited capability to accumulate seismic energy in one specific region or on a single fault requires the earthquake-size distribution to decay stronger above a particular magnitude called the corner magnitude M co (Fig. 2e). This tapering of the distribution may introduce a bias in the estimation of the b-value 16,32,41,42 . As the Groningen catalogue is limited in magnitude range with M c ranging from 0.8 to 1.5 22 and a maximum observed magnitude of M l =3.6, our derived regional, spatial b-value estimates may be influenced. Here, we explored this possible bias by jointly deriving the maximum likelihood estimates of the b-value and M co for the tapered distribution (Table 1 21 ).
We find that in all our analyses, the b-value of the tapered estimation is systematically lower than the b-value derived in the non-tapered estimation. This is directly related to the significant positive correlation between the estimation of the corner magnitude and the b-value: an increase in the corner magnitude estimate is compensated by an increase in the b-value estimate 21 . Kagan 21 further showed that the correlation coefficient increases if the difference between M co and M c is small.
We used the corrected Akaike Information Criterion (AICc 43 ) to assess and compare the fit of both the tapered and non-tapered models to the data (Table 1). In most regions, the relative probability is comparable or Table 1. Overview of the parameter estimates of the (tapered) earthquake-size distributions and the Akaike Information Criterion corrected for small sample sizes (AICc) test statistics. N indicates the number of events with M l ≥ M c , M obs max is the magnitude of the largest event in the dataset, b is the b-value estimate with its 1 − σ uncertainty, M co is the estimated corner magnitude with its 1 − σ uncertainty, and ν is the coefficient of variation. Regions (see Fig. 2a www.nature.com/scientificreports/ the non-tapered model is favoured. Only for the full catalogue the AICc of the tapered model is slightly lower than the AICc of the non-tapered model. Another estimate of the probability of the non-tapered model rejection can be obtained by analysing the coefficient of variation ( ν in Table 1 21 ). If the coefficient of variation is close to, or larger than, 0.5, the non-tapered model ( M co → ∞ ) is within the 97.5% confidence interval. Based on this test, the non-tapered model cannot be rejected in any of the analyses. However, the coefficient of variation for the full (0.59) and NW-region (0.64) catalogues are just outside the 97.5% confidence interval. We conclude that our relatively small earthquake catalogues yield little to no statistical information on the presence of a taper. Possible bias due to tapering of the earthquake-size distribution on the non-tapered b-value estimates can be regarded negligible. However, our results raise the question whether the earthquake magnitudes are as large as statistically expected? We will investigate this further after the assessment of possible temporal variations in the earthquake-size distribution.
Temporal variations in earthquake-size distribution. Following the analysis of the spatial variations, we also assessed the possible presence of temporal variations in the earthquake-size distribution. Initially, we attempted to extend the spatial analysis by adding the temporal dimension. However, due to the heterogeneous spatiotemporal development of the Groningen seismicity, this did not render meaningful results. Therefore, we have adapted the penalized likelihood-based method for the temporal domain (see "Method" section) and applied this to the full Groningen catalogue. The results of our assessment for the non-tapered solution are shown in Fig. 3a. The results for the b-value and corner magnitude of the tapered solution are shown in Fig. 3b,c, respectively. In both solutions, we obtained a slightly decreasing b-value with time, which seems insignificant at the 90% confidence level. At the same time, To complement the assessment, we split the dataset into two catalogues of approximately equal size. The first catalogue contained only events prior to 2014, the second only events from 2014 to 2022. For each dataset we derived the tapered and non-tapered solutions (Table 1). We find, by comparing the AICc's, that the events prior to 2014 (Table 1) may be better described by a non-tapered distribution, with a slightly larger b-value of 1.00 ± 0.08. The second half of the dataset (between 2014 and 2022) are described equally well by the tapered and non-tapered distributions. The two sample, left-tailed t-test confirms that the b-value distributions for the two periods may be samples of the same, larger distribution for the full dataset (the difference is insignificant at the 90% confidence level).

Bias due to correlation between the b-value and the largest magnitude.
Even though the maximum likelihood estimation takes into account only the small magnitudes in the catalogue and not the largest, a correlation between the b-value and the largest magnitude of the dataset may remain, especially for relatively small datasets. This is directly related to the fact that the mean of an exponential distribution is sensitive to outliers. As a consequence, the b-value decreases as the largest magnitude in the dataset increases 16 . Figure 4b shows a plot of the b-values as a function of the largest magnitude in the dataset analysed. The results of the Kendall-Tau significance test 44 for the correlation are shown. A very small, but significant negative correlation between the b-value and the largest magnitude in the dataset was obtained. This suggests that we observe a slightly lower b-value when the dataset contains at least one stronger event. However, the timing of the observed decrease (between 2010 and 2015) does not correlate with the observed increase in larger magnitude events (in 2003 and 2006; Fig. 3e). We conclude that the observed apparent decrease in b-value is not related to the sensitivity of the maximum likelihood estimate due to the onset of M l ≥ 3.0 events in 2003.

Bias due to correlation between the b-value and the magnitude of completeness.
Incompleteness of the earthquake catalogue at low magnitudes can introduce a significant bias on the estimation of the b-value: underestimating M c leads to an underestimation of the b-value 16 . Figure 4a shows the b-values derived in the solutions used to compute the temporal values, shown in Fig. 3a, as a function of the magnitude of completeness in the datasets analysed. The results of the Kendall-Tau test 44 are again shown (Fig. 4a). A significant positive correlation between the b-value and the magnitude of completeness was obtained. Closer examination (Fig. 3d) shows that the period of low b-values and low M c occurs predominantly after 2014. The decrease in M c is directly related to a significant extension of the seismic monitoring network and is consistent with previous assessments for this time period 22 . In addition, Fig. 4a shows that the correlation between the b-values and M c for the period as of 2014 (grey dots) is much less pronounced. Therefore, we conclude that the observed correlation is not the result of a bias due to underestimation of M c .
Are the earthquake magnitudes as large as statistically expected? Our results raise the question whether there exists an intrinsic limit on the maximum size of the induced earthquakes in Groningen. Following Van der Elst et al. 45 , we computed the statistically expected maximum magnitude range for each of the subcatalogues with time. In Fig. 5, the largest magnitude observed is plotted as a function of the largest magnitude www.nature.com/scientificreports/ expected. For the full Groningen catalogue, we find that the largest magnitude observed is consistently low compared to what could be expected, but just within the 90% confidence range. For the NW-region, we find that, as of 2005, the observed maximum magnitude is significantly smaller than statistically expected and falls outside the 90% confidence range of the expected distribution. Prior to 2005, and in the SW and CE regions, the observed magnitudes are as large as can be statistically expected.

Discussion
Our results suggest that there is clear, statistically significant evidence for spatial variations in the earthquake-size distribution of the induced seismicity sequence of Groningen. We obtained low b-values of ~ 0.8 in the northwest (NW) of the field and high b-values of ~ 1.5 in the south-western and eastern parts of the field (Table 1). We found no compelling, statistical evidence of a temporal variation of the b-values in the Groningen gas field. Further, our results clearly showed that our relatively small earthquake catalogues yield little to no statistical information on the presence of a taper. In most analyses, the relative probability of the two models was comparable, or the non-tapered model was favoured. Only for the full catalogue, the AICc of the tapered model was slightly lower than the AICc of the non-tapered model with a coefficient of variation of 0.59, just outside the Figure 5. Plot of the temporal development of the largest magnitude observed as a function of expected maximum magnitude. The horizontal error bars show the 90% confidence ranges for the expected maximum magnitude. On the diagonal, dotted line the largest magnitude observed and expected would be identical. If the 90% confidence range does not overlap with the diagonal line, the probability that the largest magnitude observed scales logarithmically with the number of earthquakes is less than 5%. In all analyses, the onset of the time interval is 1-1-1990. The colour bar indicates the year at the end of the time interval assessed: 1-1-year. The data in the most recent interval extends to 16-11-2021, i.e., the end of the catalogue used. Note that in all panels some of the time intervals are not visible as the observed and expected maximum magnitudes are (almost) identical to the following time intervals and for the CE-region no events were observed prior to 1-1-1995 (hence the dot at (0,0)). www.nature.com/scientificreports/ 97.5% confidence interval. Also for the NW-region, the coefficient of variation of 0.64 was close to this interval. This hints towards an area-characteristic corner or maximum possible magnitude. The Groningen dataset contains no M l ≥ 3.0 events prior to 2003. However, this is not remarkable. Based on the sampling statistics of the earthquake-size distribution, the largest observed earthquake up to 2005 in all regions is consistent with the largest magnitude expected (Fig. 5). Figure 5 also shows that in all regions, the largest magnitude observed has increased with time. This is inconsistent with the case where the earthquake size is determined by natural tectonics, as in that case each earthquake would have the same probability of becoming the largest 45 . In fact, production from the Groningen gas field occurred for 20-25 years without any seismicity being recorded. Since the onset of seismicity, the increases in both the number of events and the largest magnitude observed are consistent with a progressive destabilization of the pre-existing faults under increasing Coulomb stress due to reservoir depletion 46 .
We found statistical evidence of a stronger decay of the occurrence probability for the larger magnitude events as the catalogue increases (Fig. 5). The observed largest magnitudes after 2005 were significantly lower and at the edge of the 90% confidence range of the expected maximum magnitude. This means that the probability that the largest magnitude observed scales logarithmically with the number of earthquakes, is approximately 5%. Figure 5 also shows that the occurrence of a larger magnitude event later in time can mean that the largest magnitude observed again approximates or falls just within the 90% confidence range of the maximum magnitude expected (e.g., due to the M l 3.5 event in 2006). We therefore found it important to investigate if a future event, significantly larger than the current maximum observed, would result in a similar reconciliation. In both the full Groningen and NW-region, we extended the sub-catalogue with an additional fictitious event ( M l 4.0 ) to occur on the first of July 2022. The expected maximum magnitudes are given by the open triangles in Fig. 5. In the NW-region, the fictitious observed M l 4.0 still falls outside the 90% confidence range of the corresponding expected maximum magnitude. For the full Groningen catalogue, the fictitious M l 4.0 event is still on the low side, but would fall just within the 90% confidence range of the expected maximum magnitude. This difference can be mostly explained by the significantly larger b-value obtained for the full catalogue compared to the NW region, which lowers the probability of LME's (Table 1). This analysis further confirms that, after taking into consideration the statistically significant spatial variations of the b-value, only the data from the NW-region contains some statistical information on the position of a possible corner magnitude or maximum possible magnitude, which could provide an upper bound to the earthquake-size distribution. Our analysis again shows how notoriously difficult it is to constrain a magnitude bound or taper from observed seismicity alone 47,48 .
In 2016, a conditional distribution for the maximum possible magnitude ( M max ), based on statistical and model considerations, was published for Groningen 49 . The distribution extends from M l 3.8 to M l 7.2 , with a weighted mean of M l 5.0 . Our results for the tapered earthquake-size distribution of the Groningen catalogue and NW-region hint at a possible corner magnitude of M l 3.4 − 3.5 (Table 1; Fig. 2e). However, it is well known that this is an underestimate due to bias for small samples 21 . For the Groningen case, the underestimation will be of the order of 0.1-0.15 magnitude points 21 . Thus, we obtain a bias-corrected corner magnitude estimate of M l 3.5 − 3.6 , which would correspond to an equivalent truncation of the earthquake-size distribution at about M l 4.1 . If this would be considered an area-characteristic corner magnitude for the Groningen gas field, this would also explain the absence of any statistical information on the corner magnitude in most regions. After all, only a single event exceeding M l 3.0 ( M l 3.1 at Hellum on September 30, 2015) was observed outside the NWregion. Therefore, our results seem to suggest that the relative probability of the lower end of the current M max distribution 49 should be increased, as the larger maximum possible magnitudes would not affect the probability of events with magnitudes as low as M l 3.0 − 3.5 . Based on a variety of methods, Beirlant et al. 48 concluded that the area-characteristic M max of the Groningen gas field should be in the range 3.61-3.8, with a 90% confidence upper bound of 3.85 to 4.5. Our results are consistent with the upper end of their estimates.
Finally, we note that in our analysis we reach the limits of the information that can be extracted from the data. The number of events available is very limited and thus the derived b-values prone to large uncertainties and bias. In the adopted systematic approach, we have taken great care to minimize the bias as much as possible, but cannot exclude that some bias due to the small sample sizes remains. Given the implications of the earthquake-size distribution on risk estimates, this emphasizes the great importance of early-stage, dedicated, high resolution monitoring of anthropogenic seismicity, to ensure large enough databases for accurate and robust statistical analyses.

Conclusions
Our results show statistically significant spatial variations of the earthquake-size distribution in the induced seismicity sequence of Groningen. The probability of larger magnitude events in the NW-region is statistically significantly larger than in the southern and eastern parts of the gas field. These spatial variations will affect the regional probability of larger magnitude events and could be incorporated as a viable model alternative in the Groningen seismic hazard and risk assessment. We find no compelling, statistical evidence of a temporal variation.
Our analysis further shows that the occurrence probability of events with magnitudes exceeding M l 3.0 is lower than expected. Our results are consistent with the presence of an area-characteristic corner magnitude in the Groningen gas field around M l 3.5 or a maximum possible magnitude around M l 4.1 . Our results imply that the current risk assessment models, which use the conditional M max distribution 49 , overestimate the probability of larger magnitude events (M ≥ 3.0) in the Groningen gas field and thus potentially the risk posed.
However, we emphasize that an upper bound should better not be inferred based on the limited Groningen catalogue alone. Especially for relatively small datasets, it is well known that these data assessments are prone Scientific Reports | (2022) 12:10119 | https://doi.org/10.1038/s41598-022-14451-z www.nature.com/scientificreports/ to the presence of bias. This does not disqualify the assessments, but it is useful and vital to systematically study the sources of bias, as we have done in this paper. The here presented, systematic and unbiased spatiotemporal analysis procedure does not only apply to the Groningen gas field, but is transferable to, and of the utmost relevance for, accurate risk analysis of any subsurface operation potentially causing induced seismicity, especially when the corresponding earthquake catalogues are relatively small. This potentially includes, but is certainly not limited to, geothermal energy, CO 2 capture, sequestration and utilization, and hydrogen storage operations. All highly relevant in the context of the global energy transition.
Specific to the Groningen case, our conclusions can be considered a valuable addition to already existing evidence of either a tectonic or a reservoir limit on the maximum possible magnitude for the Groningen gas field, as was the case in the study by NAM 49 , especially considering the implications on hazard and risk estimates.

Data availability
Data used to produce the results of this study are freely available at the KNMI via https:// www. knmi. nl/ kennisen-datac entrum/ datas et/ aardb eving scata logus (Only available in Dutch).